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Abstract 

We investigate the problem of maintaining an encoded distributed storage system when some nodes contain adversarial errors. 

Q^ ■ Using the error-correction capabilities that are built into the existing redundancy of the system, we propose a simple linear hashing 

v,^ ' scheme to detect errors in the storage nodes. Our main result is that for storing a data object of total size Ai using an (n, k) 

Y^ • MDS code over a finite field F,, up to ii — [{n — fc)/2j errors can be detected, with probability of failure smaller than 1/M, 

by communicating only 0{n{n — k)\ogM) bits to a trusted verifier Our result constructs small projections of the data that 

preserve the errors with high probability and builds on a pseudorandom generator that fools linear functions. The transmission 

> ; 

\^ ' rate achieved by our scheme is asymptotically equal to the min-cut capacity between the source and any receiver. 
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1. Introduction 



^— V ■ We study the security and data integrity of distributed storage systems that use coding for redundancy. It is well known that 



maximum distance separable (MDS) codes can offer maximum reliability for a given storage overhead and can be used for 
distributed storage in data centers and peer-to-peer storage systems like OceanStore Q, Total Recall IJU, and FS2You O, that 
use nodes across the Internet for distributed file storage and sharing. In this paper we are interested in dealing with errors in 
the encoded representation. The errors could be introduced either through (unlikely) hard drive undetected failures or through 
a malicious or compromised server in the storage network. 

This second threat is much more eminent when the system uses network coding to maintain the redundancy of the encoded 
system as proposed recently lU. To illustrate this consider a large data object that has size M bits. If this object is to be 
stored on n servers, depending on the desired redundancy, an (71, fc) linear MDS code can be used, dividing the object into k 
packets of size A4/k each, and storing an encoded packet at each server. Assuming the code is over a finite field ¥q, requiring 
log q bits to represent each symbol, each server will also need to keep a header denoting the coding coefficients of the linear 
combinations stored on the server (see section |II] for the details) and the size of this header is larger than the size of the useful 
data if the code is used only once. For this reason it was proposed that the same code is used several times Q by dividing each 
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Fig. 1. A (4, 2) MDS code along with tlie repair of tlie first storage node. Each node stores two packets and any two nodes contain enough information to 
recover all four data packets. In this example the first node leaves the system and a new node is formed by communicating linear combinations /2,/3,/4 
which can be used to solve for xi, X2 at the new node. 



packet into A^ symbols of \ogq bits and repeating the same code A^ times. If A^ >> n the overhead of storing the coefficients 
becomes negligible. We refer to this as the A^-extended version of an MDS code, shown in Figure |2] for the (4, 2) code used 
in Figure [T] 

Observe that in this example, each node is storing two linear combinations, (rows) as opposed to one. This sub-packetization 
is performed to facilitate repair through network coding as proposed in ^. The problem of repair consists of constructing 
a new encoded node by accessing as little information from existing encoded nodes. In the example of Figure [1] we assume 
that the first storage node failed and the redundancy of the system needs to be refreshed. This is achieved by communicating 
"small" linear combinations /2, /a, Ja, of the encoded packets from nodes 2, 3, and 4 each of size 1/2 of what each node is 
storing, which as proven in ID, is information theoretically minimal. As storage nodes leave the system and new ones are 
added, this forms a dynamic storage network that keeps a fixed redundancy and reliabiUty by building new encoded packets 
from already existing ones. The problem of security should now be clear: even if a single node in this storage system is 
compromised and participates in this repair process, then it can send incorrect linear combinations that will create erroneous 
packets at the new nodes. All new nodes using these linear equations will have incorrect data and soon the whole system will 
be contaminated with nodes having erroneous data. 



Our contribution: Since the problem of repairing a code is equivalent to wireline network coding |]4|, existing techniques 
for network error correction can be used to detect and correct the errors ||6l, Q- These techniques are designed to work for 
general networks and always guarantee a transmission rate of C — 2z, where C is the min-cut capacity from the source to 
the destination and z is the number of links contaminated by the adversary. Our approach, that is creating and communicating 
small linear hashes which preserve the structure of the code, allows the detection of errors and achieves a transmission rate that 
can be asymptotically equal to C (by having the receiver connecting to all the non-erroneous nodes) since it takes advantage 
of the specific structure of the network and the set of links an adversary can contaminate. 

To explain our scheme, consider the (4, 2) MDS code of Figure [1] and assume one of the four nodes contains errors (say in 
both rows). A trusted verifier that communicates with all four nodes can find this error by getting the 8 equations contained in 

I 4 1 
each of the ~ ^ node pairs. Since this is a (4, 2) MDS code, the combinations of equations that come from error-free 

nodes will be full rank and give a consistent solution whereas the other sets will give different solutions (or might not even be 
full rank). This is, of course, just using the error-correction capability of the code to detect an error. Our contribution involves 
using this idea to the A^-extended version of a code, by creating a linear projection (hash) of each row on the same random 
vector. The key observation is that if the same random projection is used, this creates an error-correcting code for the hashes 
which can be communicated to the verifier The benefit is that each hash has size only 1/A^ of the data in each row reducing 
the amount of communication to the verifier. One complication is that each node needs to project its data on the same random 
vector of length N, which requires N\ogq bits of common randomness. Subsequently the problem at the verifier is to decode 
an error-correcting code under adversarial errors. This decoding task can be computationally inefficient but we do not address 
this issue here, assuming that the verifier can detect the errors if they are within the error correcting capabilities of the code 
as dictated by the minimum distance (half the minimum distance). Our analysis investigates under which conditions the small 
projected hash code will detect any error in the large amount of data stored at the nodes. In particular, we prove the following 
Theorem 1: In a distributed storage system storing a total of A^ bits, using an iV-extended (n, k) MDS code over F^, with the 
n storage nodes sharing 0{M) bits of common randomness, our random hashing scheme can detect up to t < ti = [(?i — fc)/2j 
errors by communicating a total of n{n — k){\ogAi + logti) bits to a verifier, with probability of failure 

P[^] < j^- 

One important weakness of the previous result is the large common randomness required which is comparable to the total 
size of the data object stored {l/k{n — k) fraction of the AA bits). Note that these bits do not have to be a secret, they only 
need to be realized after the error has been introduced to the new disk. Their large number, however, makes it impractical to 
generate them at one node and then communicate them to the others. Our second contribution involves showing how to use 
only 0(log A^) bits of common randomness to achieve almost the same performance: 

Theorem 2: In a distributed storage system storing a total of M bits, using an A^-extended (n, k) MDS code over F^, 
with the n storage nodes sharing 0(log A^) bits of common randomness, our pseudorandom hashing scheme can detect up to 
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Fig. 2. Illustration of the 3-extended version of the (4, 2) MDS code shown in Figure [T] Each of the three columns stored on the source nodes is coded 
by repeatedly using the (4, 2) MDS of Figure \T\ During verification, each row is projected on the vector r^ = (1 1 1) and the corresponding products 
Si, . . . , S4 form a codeword of the initial (4, 2) MDS code. For example, the eiTors at the first row of the first node will not be absorbed by the projection 
as long as (en 612 613) * (1 1 I)''' ^ 



ti = [{n — k)/2\ errors by communicating a total of 0{n{n — k) log A^) bits to the verifier, with probability of failure 



\F'] < —. 



If there is no common randomness, the verifier can generate the O(logA^) random bits and communicate these to all the 
nodes requiring a total of 0(nlog A^) extra communicated bits. 

Notice that in this case the total number of bits communicated scales only logarithmically in A^, to achieve a probability of 
failure that scales like I/TM. Our construction relies on the pseudorandom small-bias generator used in lO which can expand 
log N random symbols of ¥q (which require log N log q random bits to generate), into N pseudorandom symbols that can 
"fool" any linear functior[||. The only modification to our algorithm is projecting each stored row on this pseudorandom vector 
to generate each hash and this induces only a small addition to the probability of error. Notice that our work does not rely on 
any cryptographic assumptions and guarantees that errors inserted in the distributed storage system will be detected with high 
probability if they are within the capabilities of the code used. 

Using the error-correction capabiUty of the code for distributed storage has been suggested before as a way to detect 
errors ifTOl . ifTTI and identify "free riders" within the network. A different approach to find errors injected in distributed storage 
and content distribution systems is the use of signatures and hash functions. Reference lfT2l introduced the use of homomorphic 
hashing functions that enables a nodes to perform on-the-fly verification of erasure-encoded blocks. Gkantsidis et al. ifTSl used 
the computationally less expensive secure random checksums to detect polluted packets in content distribution system that 
use network coding while lfT4l . ifTSl used a method of subspace signatures based on different cryptographic primitives. See 
also lfT6l . ifTTll . ifTSl for other related work on security and distributed storage. 



II. Model 

As stated, we consider a data object of size A4 bits that is divided into k pieces (of size A^/fc bits each) and these are coded 
into n (> k) encoded pieces through a linear {n, k) maximum distance separable (MDS) code. These encoded pieces are stored 
on 71 distinct storage nodes along with a header denoting the exact linear combination saved at all the storage nodes. Since the 
size of the code (?7, k) will be much smaller than A^, the overhead of storing the code description everywhere (including the 
verifier) is minimal. This simplifies the model and we can now assume that the errors occur only at the data, since an error 
at the header would be immediately detected. 

We assume that the original information (of size A4 bits) is organized into a matrix X with k{n — k) rows and N columns. 
The elements of this matrix are elements of the finite field Fg, i.e., X G ^Kn-k)xN y^jjere q is a prime or an integer power 
of a prime. Each column Xf E ^k{n-k)xi (i g |i^ . . . ^ jv}) of matrix X will be separately encoded with the use of an (n, k) 
MDS code with generator matrix G £ ]^n{n-k)xk{n-k) ^^^ ^jj ^^^ columns GXf G F^"'""''^^^ derived by this encoding 
will be stored on the n different storage nodes of the distributed storage system. We will call this code applied to the N 
different columns of matrix X as the A^-extended MDS code. The overall effect that the A^-extended MDS code has upon the 
information matrix X is captured by the matrix multiplication GX. Figure |2] shows such a code for iV = 3 where the MDS 
code used is the same as the one shown in Figure [T| 

The storage nodes of the distributed storage system are assumed to have limited computational capabilities allowing them 
only to perform inexpensive operations over the finite field ¥q. Some of these storage nodes are assumed to store erroneous 
information, where these errors might be either random due to hardware failures or inserted adversarially by a malicious user. 
The malicious user can be computationally unbounded, have knowledge of all the information stored on the distributed storage 
system and can insert errors to any t of the storage nodes. 

We assume the existence of a special node called the verifier that is assigned to check the integrity of the data stored on 
different storage nodes. The verifier does not have access to the initial data object (other than the description of the code) and 
therefore has to rely on the communicated information to check which nodes contain errors. 

III. Random Hashes 

A. Illustrating example 

Assume that in the distributed storage system shown in Figure |2] with four storage nodes it is known that one of them (the 
first in this example) stores erroneous information. The goal of the verifier that overlooks the state of the whole system is first 
to find the erroneous disk with the minimum data exchange and second to repair it by using the information stored on the 
other disks. Since all three columns stored on the distributed storage system are codewords of a (4, 2) MDS code with at most 
one error (some columns might be error free) and minimum distance d = 3, the naive approach to find the erroneous disk is 
to download all data from different disks and then by using minimum distance decoding on each separate column one would 
be able to find the erroneous disk. 

'First introduced by Naor and Naor in (3 for linear functions in F2. 



The naive approach would certainly find the faulty disk but it would require the transfer of double the size of the file stored 
ijM bits of information in general). So as the size of the file increases this approach will become prohibitively expensive in 
bandwidth. Instead of transmitting all the information stored on the distributed storage system, the central node could choose 
a vector with each component chosen independently and uniformly at random from F, and have each storage node transmit 
the inner product (called the hash product) between the randomly chosen vector and each of the rows stored at the disks. In 
the absence of errors, these hash products will form a codeword of the MDS code used to encode the different columns of the 
information matrix. In case there are errors, as in the case of the first node in Figure |2l the multiplication with the random 
vector will not obscure these errors unless Sei = <=> en + 6^2 + 6^3 = 0, for i = {1, 2}. The reason why the chosen vector 
should be random is so that the adversary can not deliberately choose the errors to make them "disappear" after the vector 
multiplication. 

B. General case 

The initial information matrix X G jr «("-*:) x^ js coded with the use of an iV-extended MDS code with generator matrix 
G e ]pn(n-fe)xfe("-fe) Some of the storage nodes contain errors and therefore what is actually stored on the distributed storage 
system \s,Y~ GX + E where Y,E& fn{7i-k)xN ^j^^ ^ j^ ^^^ error matrix. The verifier wants to identify all erroneous disks 
by sending hash product requests to all nodes. Then the following theorem holds: 

Proof of Theorem \J} All storage nodes share A^ log q bits of common randomness and therefore they can create the same 
random vector r G F^'^^ with each component of vector r drawn uniformly at random from F^. After the random vector r is 
computed, each storage node calculates the hash product-inner product-between the random vector r and its content on every 
row. These n(n — k) hash products are equal to: 

H = Yr = {GX + E)r ^H = G{Xr)+e (1) 

where e = Er G '^j^A''i--k)y.i j^ ^ column vector with at most tm non-zero components representing the erroneous disks (these 
non-zero components must correspond to the position of at most t,n storage nodes with errors). The key observation is that the 
projection will not identify an error pattern at a specific row if vector r is orthogonal to that row of E. Intuitively, a randomly 
selected r will be non-orthogonal to an arbitrary row of E with high probability and this is the probability we need to analyze. 

From equation ([1]) it can be seen that the order of applying the MDS encoding on the different columns of the information 
matrix X and the calculation of the hash products can be interchanged {{GX)r = G{Xr)) making the process of identifying 
the erroneous disks equivalent to finding the error positions in a regular MDS code that is guaranteed to succeed if the minimum 
distance of the code (n — fc + 1) is larger than twice the number of errors 2t (that is indeed satisfied by the assumptions of 
Theorem [Hi. 

The set of indices that correspond to the components of vector e that come from disk i is i?; = { (i — 1 ) (n — fc) + 1 , . . . , i (n — 
fc)}. We are interested in vector e since this gives us the positions of the faulty disks. One complication that might arise is 
the fact that disk i might contain an error, meaning that rows {Ej,j G Ri} of the error matrix E are not all zero whereas the 



corresponding components of vector e ({ej,j G Ri}) turn out to be zero and therefore our scheme fails to detect that error. 
Assume that the set of erroneous disks is M^ C {1, 2, . . . , n} and define P[F] to be the probabiUty of faihng to detect some 
errors. We get 
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where inequality (*) holds due to the fact that the probability that some storage node with errors produce zero hash products is 
less than l/q^ where / is the number of linearly independent errors rows saved at its disk. So by assuming that the adversary 
has produced linearly dependent errors would only increase the probability of failure. 

If the adversary has saved error vectors at storage node i with rank 1 then the probability P[ n {E^ r = 0)] in equation 
(|2]l reduces to an equation for a single row (assuming row k): 
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where we only took the terms with a non-zero error coefficient Ckf- The numbers (e^fi /e^f) rji (ckf is any non-zero error 
element from the fc* row) are independent and uniform over F, and so is their sum according to Lemma [T] So the last equality 
holds since two independent uniformly distributed over ¥q random numbers are equal with probability 1/q. 

When the errors have rank / > 1 then the probability P[ n (EW — 0)] can be evaluated by disregarding the linearly 

3&R, 

dependent rows. By looking only at the linearly independent ones and by choosing / columns we can formulate an invertible 
submatrix Ei G F^ ^ ■'' and similarly to the previous analysis we have that P[ n {Ej r = 0)] = P[Ei f = 6] where f, 6 G F -^ ^ ^ 
where f are the components of the random vector that correspond to the columns where the submatrix Ei was formed. Since 
b is uniformly random, due to the previous analysis P[_Ef = b] = 1/q^ . 

Each of the n storage nodes has to convey to the verifier the result of the hash product from all its (n-k) rows, so that the 
total size of the hash communicated is 7^ = n{n — k) log q, whereas the size of the file Ai = k{n — k)N log q. By substituting 
the field q equal to [^^^JtM we conclude the proof of Theorem [T| ■ 

Lemma 1: The sum of any number of independent uniformly distributed random variables gives a uniformly distributed 
random variable. 

Proof: Without loss of generality we will prove Lemma[T]only for the case of two random variables. Assume that x, y G F^ 
are two independent and uniformly distributed random variables. We will prove that a; + y is also uniformly distributed, indeed 
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where equality (*) holds due to the independence between x and y. ■ 

Before we continue to prove Theorem |2] we need to give the following definition (extension of Definition 2.1 in |8l to 
non-prime numbers): 

Definition 1: a) Let g be a prime or an integer power of a prime. For a random variable X with values in Fg, let the bias 
of X be defined by 



bias(X) = (g - 1)V[X = 0] - ¥[X ^ 0] 

A random variable X E¥g is e-biased if |bias(X) < e|. 

b) The sample space 5 C F^ is e-biased if for all c e F^ and each sequence /3 = (/3i, ■.■,f3g) G Ff"\{0^} the following is 
valid: if a sequence X = {xi, . . . , xi) € 5 is chosen uniformly at random from S, then the random variable {J2i=i Pi^i + c) 
is e-biased. 

Proof of Theorem |2} All storage nodes execute the algorithm described in Proposition 4.1o of ||8l and produces a 
pseudorandom vector r' G F^^^^ with N components. The quantity m in the algorithm (and consequently the field size 
¥qm too) is chosen so that the bias {q — \){N — l)/g'" is equal to 1 and therefore g"' = {q — l)(iV — 1) or m = 0{\ogN). 
The size of the necessary seed that needs to be provided at all the storage nodes so that they can start the algorithm is two 
elements from F^™ chosen uniformly at random or equivalently 2m\ogq = O(logiV) random bits. 

Once all storage nodes have constructed the same pseudorandom vector r' they compute the inner product between vector 
r' and the content stored on each row of the storage nodes. These pseudorandom products are all sent to the verifier to identify 
the erroneous disks. The whole analysis is identical to the proof of Theorem [T] with one major difference in the calculation of 
failure probability P[F']. For the case of a pseudorandom vector ?■', using the same notation as in the proof of Theorem [T] 
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where inequaUty (*) holds since P (£''' r' = O) = 2/q. Indeed the bias of the space constructed by the pseudorandom procedure 
is 1 that means: 
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-This algorithm is described for q prime but it is readily extensible to q equal to an integer power of a prime. 



^ \qF {E] r' = 0) - l| < 1 ^ P (SJ r' = 0) < - 

By setting q = 2{n — k)tiAi we conclude the proof. ■ 

We would like to underline here that both theorems above exhibit the same behavior on the probability. In Theorem|2]the size 
of the required common randomness is decreased in the expense of an increased field size. Moreover the use of pseudorandom 
generators incurs the additional computational cost at each storage node of 0{Nm?) or 0{M\ogAd) operations in Fq to 
generate the pseudorandom vector r' . 
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